Search CORE

22 research outputs found

QCMP: load balancing via in-network reinforcement learning

Author: Rienecker Benjamin
Zheng Changgang
Zilberman Noa
Publication venue: Association for Computing Machinery
Publication date: 10/09/2023
Field of study

Traffic load balancing is a long time networking challenge. The dynamism of traffic and the increasing number of different workloads that flow through the network exacerbate the problem. This work presents QCMP, a Reinforcement-Learning based load balancing solution. QCMP is implemented within the data plane, providing dynamic policy adjustment with quick response to changes in traffic. QCMP is implemented using P4 on a switch-ASIC and using BMv2 in a simulation environment. Our results show that QCMP requires negligible resources, runs at line rate, and adapts quickly to changes in traffic patterns

Oxford University Research Archive

E-commerce bot traffic: in-network impact, detection, and mitigation

Author: Hemmatpour Masoud
Zheng Changgang
Zilberman Noa
Publication venue: IEEE
Publication date: 11/04/2024
Field of study

In-network caching expedites data retrieval by storing frequently accessed data items within programmable data planes, thereby reducing data access latency. In this paper we explore a vulnerability of in-network caching to bots’ traffic, showing it can significantly degrade performance. As bots constitute up to 70% of traffic on e-commerce platforms like Amazon, this is a critical problem. To mitigate the effect of bots’ traffic we introduce In-network Caching Shelter (INCS), an in-network machine learning solution implemented on NVIDIA BlueField-2 DPU. Our evaluation shows that INCS can detect malicious bot traffic patterns with accuracy up to 94.72%. Furthermore, INCS takes smart actions to mitigate the effects of bot activity

Oxford University Research Archive

Towards continuous threat defense: in-network traffic analysis for IoT gateways

Author: Dittmann Lars
Zang Mingyuan
Zheng Changgang
Zilberman Noa
Publication venue: IEEE
Publication date: 13/10/2023
Field of study

The widespread use of IoT devices has unveiled overlooked security risks. With the advent of ultra-reliable lowlatency communications (URLLC) in 5G, fast threat defense is critical to minimize damage from attacks. IoT gateways, equipped with wireless/wired interfaces, serve as vital frontline defense against emerging threats on IoT edge. However, current gateways struggle with dynamic IoT traffic and have limited defense capabilities against attacks with changing patterns. In-network computing offers fast machine learning-based attack detection and mitigation within network devices, but leveraging its capability in IoT gateways requires new continuous learning capability and runtime model updates. In this work, we present P4Pir, a novel in-network traffic analysis framework for IoT gateways. P4Pir incorporates programmable data plane into IoT gateway, pioneering the utilization of in-network machine learning (ML) inference for fast mitigation. It facilitates continuous and seamless updates of in-network inference models within gateways. P4Pir is prototyped in P4 language on Raspberry Pi and Dell Edge Gateway. With ML inference offloaded to gateway’s data plane, P4Pir’s in-network approach achieves swift attack mitigation and lightweight deployment compared to prior ML-based solutions. Evaluation results using three public datasets show that P4Pir accurately detects and fastly mitigates emerging attacks (>30% accuracy improvement and sub-millisecond mitigation time). The proposed model updates method allows seamless runtime updates without disrupting network traffic

Oxford University Research Archive

GridWatch: a smart network for smart grid

Author: Ha Phuong Hoai
Hemmatpour Masoud
Zheng Changgang
Zilberman Noa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/07/2024
Field of study

The adoption of decentralized energy market models facilitates the exchange of surplus power among local nodes in peer-to-peer settings. However, decentralized energy transactions within untrusted and non-transparent energy markets in modern Smart Grids expose vulnerabilities and are susceptible to attacks. One such attack is the False Data Injection Attack, where malicious entities intentionally inject misleading information into the system. To address this threat, this paper proposes GridWatch, an effective real-time in-network intelligent framework to detect false data injection attacks. Gridwatch operates in a hybrid model. It deploys inference model in the programmable network devices and also on the server to detect false data injection attacks. GridWatch was evaluated using a real-world dataset from Austin, Texas, and can detect false data injection attacks with 94.8% accuracy. GridWatch on average performs 4 billions transactions per second in less than 1.8 microsecond latency

Oxford University Research Archive

In-network machine learning using programmable network devices: a survey

Author: Ben-Itzhak Yaniv
Ding Damu
Hong Xinpeng
Vargaftik Shay
Zheng Changgang
Zilberman Noa
Publication venue: IEEE
Publication date: 19/12/2023
Field of study

Machine learning is widely used to solve networking challenges, ranging from traffic classification and anomaly detection to network configuration. However, machine learning also requires significant processing and often increases the load on both networks and servers. The introduction of in-network computing, enabled by programmable network devices, has allowed to run applications within the network, providing higher throughput and lower latency. Soon after, in-network machine learning solutions started to emerge, enabling machine learning functionality within the network itself. This survey introduces the concept of in-network machine learning and provides a comprehensive taxonomy. The survey provides an introduction to the technology and explains the different types of machine learning solutions built upon programmable network devices. It explores the different types of machine learning models implemented within the network, and discusses related challenges and solutions. In-network machine learning can significantly benefit cloud computing and next-generation networks, and this survey concludes with a discussion of future trends

Oxford University Research Archive

DINC: toward distributed in-network computing

Author: Feng Aosong
Hong Xinpeng
Tang Haoyue
Tassiulas Leandros
Zang Mingyuan
Zheng Changgang
Zilberman Noa
Publication venue: Association for Computing Machinery
Publication date: 28/11/2023
Field of study

In-network computing provides significant performance benefits, load reduction, and power savings. Still, an in-network service’s functionality is strictly limited to a single hardware device. Research has focused on enabling on-device functionality, with limited consideration to distributed in-network computing. This paper explores the applicability of distributed computing to in-network computing. We present DINC, a framework enabling distributed in-network computing, generating deployment strategies, overcoming resource constraints and providing functionality guarantees across a network. It uses multi-objective optimization to provide a deployment strategy, slicing P4 programs accordingly. DINC was evaluated using seven different workloads on both data center and wide-area network topologies, demonstrating feasibility and scalability, providing efficient distribution plans within seconds

Oxford University Research Archive

Event-driven Temporal Models for Explanations - ETeMoX: Explaining Reinforcement Learning

Author: Bencomo Nelly
Boubeta-Puig Juan
Garcıa-Domınguez Antonio
Ortiz Guadalupe
Parra-Ullauri Juan Marcelo
Yang Shufan
Zhen Chen
Zheng Changgang
Publication venue: Springer
Publication date: 01/12/2021
Field of study

Modern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows

Durham Research Online

Aston Publications Explorer

Repository@Napier

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz

Explore Bristol Research

Reward-Reinforced Generative Adversarial Networks for Multi-agent Systems

Author: Bencomo Nelly
Garcia-Dominguez Antonio
Parra-Ullauri Juan
Yang Shufan
Zheng Changgang
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/06/2022
Field of study

Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of more than one agent. A number of nonlinear function approximation methods have been proposed for solving the Bellman equation, which describe a recursive format of an optimal policy. However, how to leverage the value distribution based on reinforcement learning, and how to improve the efficiency and efficacy of such systems remain a challenge. In this work, we developed a reward-reinforced generative adversarial network to represent the distribution of the value function, replacing the approximation of Bellman updates. We demonstrated our method is resilient and outperforms other conventional reinforcement learning methods. This method is also applied to a practical case study: maximising the number of user connections to autonomous airborne base stations in a mobile communication network. Our method maximises the data likelihood using a cost function under which agents have optimal learned behaviours. This reward-reinforced generative adversarial network can be used as a generic framework for multi-agent learning at the system level

Repository@Napier

Iisy: hybrid in-network classification using programmable switches

Author: Ben-Itzhak Yaniv
Bensoussane Riyad
Bernabeu Antoine
Bui Thanh T
Kaupmees Siim
Vargaftik Shay
Xiong Zhaoqi
Zheng Changgang
Zilberman Noa
Publication venue: IEEE
Publication date: 16/02/2024
Field of study

The soaring use of machine learning leads to increasing processing demands. As data volume keeps growing, providing classification services with good machine learning performance, high throughput, low latency, and minimal equipment overheads becomes a challenge. Offloading machine learning tasks to network switches can be a scalable solution to this problem, providing high throughput and low latency. However, network devices are resource constrained, and lack support for machine learning functionality. In this paper, we introduce IIsy - a novel mapping tool of machine learning classification models to off-the-shelf switches. Using an efficient encoding algorithm, IIsy enables fitting a range of classification models on switches, coexisting with standard switch functionality. To overcome resource constraints, IIsy adopts a hybrid approach for ensemble models, running a small model on a switch and a large model on the backend. The evaluation shows that IIsy achieves near-optimal classification results, within minimum resource overheads, and while reducing the load on the backend by 70% for data-intensive use cases

Oxford University Research Archive

Linnet: limit order books within switches

Author: Hong Xinpeng
Zheng Changgang
Zilberman Noa
Zohren Stefan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/10/2022
Field of study

Financial trading often relies nowadays on machine learning. However, many trading applications require very short response times, which cannot always be supported by traditional machine learning frameworks. We present Linnet, providing financial market prediction within programmable switches. Linnet builds limit order books from high-frequency market data feeds within the switch, and uses them for machine-learning based market prediction. Linnet demonstrates the potential to predict future stock price movements with high accuracy and low latency, increasing financial gains